Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UpdatePodSandbox to NRI plugin API #141

Merged
merged 1 commit into from
Feb 19, 2025

Conversation

chrishenzie
Copy link
Contributor

This API is called by the container runtime to update pod sandbox resources when they change. This is for the In-place Pod Resize KEP-1287.

@tallclair
@samuelkarp

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 92689e4 to f7e11c2 Compare February 6, 2025 21:08
@chrishenzie chrishenzie changed the title Add UpdatePodSandbox to API Add UpdatePodSandbox to NRI plugin API Feb 6, 2025
@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch 2 times, most recently from 1b158a0 to 33821eb Compare February 6, 2025 21:14
@aojea
Copy link
Contributor

aojea commented Feb 7, 2025

cc @MikeZappa87

This may have some interesting properties for networking plugins

@askervin
Copy link
Contributor

askervin commented Feb 7, 2025

Now that pods can be updated, should we extend SynchronizeResponse with repeated PodSandboxUpdate, too?

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 33821eb to 5688820 Compare February 8, 2025 03:05
@klihub
Copy link
Member

klihub commented Feb 8, 2025

Now that pods can be updated, should we extend SynchronizeResponse with repeated PodSandboxUpdate, too?

Good question/point. And if we do that, we should also consider adding an (unsolicited) UpdatePodSandboxes next to UpdateContainers in the protocol.

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 5688820 to 7121d70 Compare February 8, 2025 18:38
@chrishenzie
Copy link
Contributor Author

Now that pods can be updated, should we extend SynchronizeResponse with repeated PodSandboxUpdate, too?

Good question/point. And if we do that, we should also consider adding an (unsolicited) UpdatePodSandboxes next to UpdateContainers in the protocol.

Can the container runtime make changes to the Pod sandbox's resources? I thought that was managed (at least for now) by kubelet.

@MikeZappa87
Copy link

MikeZappa87 commented Feb 8, 2025

Now that pods can be updated, should we extend SynchronizeResponse with repeated PodSandboxUpdate, too?

Good question/point. And if we do that, we should also consider adding an (unsolicited) UpdatePodSandboxes next to UpdateContainers in the protocol.

Can the container runtime make changes to the Pod sandbox's resources? I thought that was managed (at least for now) by kubelet.

I am pretty certain if the PodSandboxStatus RPC returns additional labels, annotations that the kubelet doesn't know about a test will fail. I believe I broke that last year :-)

See the comment at https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L589

@klihub
Copy link
Member

klihub commented Feb 10, 2025

Now that pods can be updated, should we extend SynchronizeResponse with repeated PodSandboxUpdate, too?

Good question/point. And if we do that, we should also consider adding an (unsolicited) UpdatePodSandboxes next to UpdateContainers in the protocol.

Can the container runtime make changes to the Pod sandbox's resources? I thought that was managed (at least for now) by kubelet.

@chrishenzie Not at the moment... And actually considering that pod-level cgroup management is in the kubelet and not in the runtime, I think we try to do this now wrong here. At the NRI level this needs to be an event-type message (no possibility for the plugin to request changes to the pod in response).

And if this is true, (maybe) it would be better to add it as a new Event enumeration and pass to the stub/plugin on the wire as a StateChange event, just like all the other similar ones (or then maybe it wouldn't... see later comment related to StateChange).

@@ -208,6 +209,17 @@ func (r *Adaptation) RunPodSandbox(ctx context.Context, evt *StateChangeEvent) e
return r.StateChange(ctx, evt)
}

// UpdatePodSandbox relays the corresponding CRI request to plugins.
func (r *Adaptation) UpdatePodSandbox(_ context.Context, _ *UpdatePodSandboxRequest) (*UpdatePodSandboxResponse, error) {
Copy link
Member

@klihub klihub Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking more about this, I think this should be an event: unlike for containers, pod cgroup management is in the kubelet. Therefore there is not much point in allowing a plugin to respond with requested changes to the update. There is nothing the runtime could do to make that happen.

Currently all other similar events are passed on the wire using a StateChange with an Event enumeration indicating which event is being passed to the stub/plugin. If we go that route, this would then become just a new enumeration, similar to how PostUpdatePodSandbox is now added. However, if we want to pass to the plugin both the old and the updated pod resources, that's not possible with a StateChange message in its current form (we'd need to add a new field there).

I'd anyway like to update the protocol and unfold all StateChange represented events to separate RPC calls with call-specific messages (here's a WiP branch doing that) before we consider tagging a 1.0 NRI version. Now with that in mind, maybe the best way to go forward with this is to not try to shove it into a new StateChange delivered event but instead have it already as a separate RPC call (like it is now)... Thoughts on that ?

Copy link
Contributor Author

@chrishenzie chrishenzie Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a likelihood that pod-level cgroup management could shift to the container runtime in the future? If so I think it would be more intuitive for this to not be just an event but to also have its own response type (which can be empty for now per your other feedback).

It looks like this method signature would fit in with the approach in your branch so I'd be inclined to not make any changes here, but I'd defer to your preference.

Copy link
Member

@klihub klihub Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. I think we should stick to the normal RPC signatures on the wire and just have the returned UpdatePodSandboxResponse struct be empty. And in the stub/plugin interface (the handlers) it is probably better to only have a returned error in the signature of the handler, just like you have it there now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a likelihood that pod-level cgroup management could shift to the container runtime in the future? If so I think it would be more intuitive for this to not be just an event but to also have its own response type (which can be empty for now per your other feedback).

I know some folks who'd like to at least give some serious consideration for that to happen in the long run.

Copy link
Member

@mikebrow mikebrow Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubelet is only one CRI user.. albeit the primary case.

I like the idea that certain pod meta fields should be managed only by the owning client, and we can in some cases also have overlay rights to fields owned by physical devices managed by the kernel/said device services. IOW if the container runtime and/or plugin is configured as a manager of a particular device.. then we should keep an unmolested copy of the owning client's pod meta, and a copy of device plugin edits that would otherwise have to be done by a runc proxy. This way we can have kubelet's pod meta, the device plugin pod meta.. and same for containers. We would normally do such overrides in the container runtime and / or runc via container runtime config / runc oci plugins..

@klihub
Copy link
Member

klihub commented Feb 10, 2025

Now that pods can be updated, should we extend SynchronizeResponse with repeated PodSandboxUpdate, too?

Good question/point. And if we do that, we should also consider adding an (unsolicited) UpdatePodSandboxes next to UpdateContainers in the protocol.

@chrishenzie @askervin I think the answer to both of these questions is probably 'no, we should not'. Unlike for containers, pod level cgroup management is done in the kubelet. There is nothing the runtime can do if a plugin requested changes, either unsolicited or in response to an UpdatePodSandbox. So we should not pretend otherwise at the protocol level either.

@MikeZappa87
Copy link

MikeZappa87 commented Feb 10, 2025

We were hoping to get the Pod Labels/Annotations added to the request message. The reason behind this is that if the label is updated on a K8s pod, the container runtime is never updated with this state thus etcd becomes the source of truth. I think adding this will help the NRI ecosystem move service mesh and possibly other network plugins over to NRI.

Links to the fields.

https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L479

https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L496

cc: @mikebrow @aojea

@aojea
Copy link
Contributor

aojea commented Feb 11, 2025

We were hoping to get the Pod Labels/Annotations added to the request message. The reason behind this is that if the label is updated on a K8s pod, the container runtime is never updated with this state thus etcd becomes the source of truth. I think adding this will help the NRI ecosystem move service mesh and possibly other network plugins over to NRI.

If we have labels ( and maybe also annotations) information on the PodSandboxXXXXX hooks, then we remove all the need for watching the apiserver on network integrations , like cni plugins and service meshes, it will be a really nice addition that solves a long standing problem, the runtime is the source of truth for the Pod lifecycle on the Node in these hooks, components having to go to the apiserver are always going to race, since the state of the apiserver is the desired state that does not necessarily matches the current state on the runtime (this also solve problems when node get disconnected temporary, or have issues to connect to the control plane)

@klihub
Copy link
Member

klihub commented Feb 11, 2025

@chrishenzie @askervin Looking at my comments from yesterday it's probably warranted to try and give a bit more coherent summary which does not look like a diary entry from an H.P. Lovecraft story:

  • my understanding is that UpdatePodSandbox(Resources) simply informs (the runtime and) NRI plugins about what has happened
  • therefore the plugin can't request changes to the reported update and the protocol/API should not imply otherwise
  • consequently the response message should not include resources
  • let's go with the dedicated new RPC call suggested here (as opposed to using StateChange on the wire)

@klihub
Copy link
Member

klihub commented Feb 11, 2025

We were hoping to get the Pod Labels/Annotations added to the request message. The reason behind this is that if the label is updated on a K8s pod, the container runtime is never updated with this state thus etcd becomes the source of truth. I think adding this will help the NRI ecosystem move service mesh and possibly other network plugins over to NRI.

Links to the fields.

https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L479

https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L496

cc: @mikebrow @aojea

@MikeZappa87 @aojea @askervin

Looking at the CRI protocol definition changes in the KEP, it looks to me as if those "annotations" in UpdatePodSandboxResourcesRequest were blindly copied from UpdateContainerResourcesRequest simply replacing container with sandbox in the comments. Currently they are defined/described as

// Unstructured key-value map holding arbitrary additional information for
// sandbox resources updating. This can be used for specifying experimental
// resources to update or other options to use when updating the sandbox.
map<string, string> annotations = 4;

This seems to imply to me different semantics than updating the the pods annotations. Rather optionally annotating (how to apply) the updates instead.

Looking at the runtime implementations for UpdateContainerResources, I think the corresponding container resource update annotations are simply ignored and seem more like a protocol relic than something actually in use.

If we'd prefer not to carry over (maybe even only semi-accidentally) a relic to the new UpdatePodSandboxResources CRI RPC call, but instead would like to use it for actually updating pod annotations, it would be good to bring this up in the KEP. And probably also mention the desired addition of labels as well.

@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch 2 times, most recently from 409373f to 21fec05 Compare February 12, 2025 05:34
@aojea
Copy link
Contributor

aojea commented Feb 12, 2025

@klihub agree, annotations are good for experimentation but then become an ugly API and extension point creating tightly coupling between the implementations ... we need to think about something better, I will check with the DRA folks, since this is about resources we may model this problem as a DRA problem that has better semantics to support what we want

@klihub
Copy link
Member

klihub commented Feb 12, 2025

@klihub agree, annotations are good for experimentation but then become an ugly API and extension point creating tightly coupling between the implementations ... we need to think about something better, I will check with the DRA folks, since this is about resources we may model this problem as a DRA problem that has better semantics to support what we want

@aojea Well, I meant something much simpler with my comment (and I think being able to propagate pod annotation/label updates to the runtime could be useful). All I wanted to say is that if we want to use the annotations in that message to update pod annotations, then we should bring this up in the KEP discussion. And also change the related comment in the protocol description, because it looks like something which was simply copied from the existing UpdateContainerResourcesRequest and the comments imply different usage/semantics.

@tallclair
Copy link

Following along the discussion of annotations, I think we should probably just drop the annotations field from the UpdatePodSandboxResources request for now. We are not currently using them for the analogous container request, and we don't support updating extended resources today. I would prefer to hold off on adding these until we have a specific use case that calls for them.

If we'd prefer not to carry over (maybe even only semi-accidentally) a relic to the new UpdatePodSandboxResources CRI RPC call, but instead would like to use it for actually updating pod annotations, it would be good to bring this up in the KEP. And probably also mention the desired addition of labels as well.

I don't think tacking these onto the UpdatePodSandboxResources request is the right approach. If we want to pass updates to these annotations through the CRI, we should consider a dedicated request, or a generic UpdatePodSandbox request. For now, I'd like to keep the use case of UpdatePodSandoxResources more targeted.

@klihub
Copy link
Member

klihub commented Feb 13, 2025

@chrishenzie I think we'd still need to get rid of the usage of StateChange on the adaptation side implementation of UpdatePodSandbox. Maybe, something like this: klihub@7b1b3e0

@mikebrow
Copy link
Member

We were hoping to get the Pod Labels/Annotations added to the request message. The reason behind this is that if the label is updated on a K8s pod, the container runtime is never updated with this state thus etcd becomes the source of truth. I think adding this will help the NRI ecosystem move service mesh and possibly other network plugins over to NRI.

Links to the fields.

https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L479

https://github.com/kubernetes/cri-api/blob/82e1e9174de30efdb4f82fbb5251c31e4a7ce34a/pkg/apis/runtime/v1/api.proto#L496

cc: @mikebrow @aojea

Annotations is a good pattern to explore the thought that we should have a set of client owner annotations reported through the cri apis to the CRI client and another set of annotations that overlay (add) to the set managed under the client at the device layer. We could report the additional annotations to the client in a new field or in the info map.

@mikebrow
Copy link
Member

mikebrow commented Feb 13, 2025

Following along the discussion of annotations, I think we should probably just drop the annotations field from the UpdatePodSandboxResources request for now. We are not currently using them for the analogous container request, and we don't support updating extended resources today. I would prefer to hold off on adding these until we have a specific use case that calls for them.

If we'd prefer not to carry over (maybe even only semi-accidentally) a relic to the new UpdatePodSandboxResources CRI RPC call, but instead would like to use it for actually updating pod annotations, it would be good to bring this up in the KEP. And probably also mention the desired addition of labels as well.

I don't think tacking these onto the UpdatePodSandboxResources request is the right approach. If we want to pass updates to these annotations through the CRI, we should consider a dedicated request, or a generic UpdatePodSandbox request. For now, I'd like to keep the use case of UpdatePodSandoxResources more targeted.

Nod... IMO any need to keep "extra" annotations, and/or ENVs, between the plugins and devices should kept in the container runtimes pod/container store as extras and either not reported to the client or if reported then as additional fields and/or placed temporarily in info maps.. Trying to avoid said overrides/additions being unmanaged through oci hooks.

This API is called by the container runtime to update pod sandbox
resources when they change. This is for the In-place Pod Resize
KEP-1287.

Signed-off-by: Chris Henzie <[email protected]>
@chrishenzie chrishenzie force-pushed the update-pod-sandbox-api branch from 21fec05 to e4ce8c1 Compare February 13, 2025 18:01
@MikeZappa87
Copy link

Following along the discussion of annotations, I think we should probably just drop the annotations field from the UpdatePodSandboxResources request for now. We are not currently using them for the analogous container request, and we don't support updating extended resources today. I would prefer to hold off on adding these until we have a specific use case that calls for them.

If we'd prefer not to carry over (maybe even only semi-accidentally) a relic to the new UpdatePodSandboxResources CRI RPC call, but instead would like to use it for actually updating pod annotations, it would be good to bring this up in the KEP. And probably also mention the desired addition of labels as well.

I don't think tacking these onto the UpdatePodSandboxResources request is the right approach. If we want to pass updates to these annotations through the CRI, we should consider a dedicated request, or a generic UpdatePodSandbox request. For now, I'd like to keep the use case of UpdatePodSandoxResources more targeted.

Nod... IMO any need to keep "extra" annotations, and/or ENVs, between the plugins and devices should kept in the container runtimes pod/container store as extras and either not reported to the client or if reported then as additional fields and/or placed temporarily in info maps.. Trying to avoid said overrides/additions being unmanaged through oci hooks.

We should keep in mind that K8s and the container runtime may have different states for the labels as you can apply a label to a running pod and the container runtime is never updated.

@klihub klihub self-requested a review February 14, 2025 07:27
Copy link
Contributor

@askervin askervin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@klihub klihub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@fuweid fuweid merged commit e597e78 into containerd:main Feb 19, 2025
8 checks passed
@chrishenzie chrishenzie deleted the update-pod-sandbox-api branch February 19, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants